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Abstract 

Lasso is a popular method for sparse linear regression, especially 
for problems in which p > n. But when p n, existing results from 
literature about the model selection consistency of lasso always require 
special and strong conditions, including the information from unknown 
true coefficients. An important question is: If the lasso solution can 
select the true variables without such strict condition? 

In this paper, we investigate a new train of thought to lead the 
model selection consistency of lasso. One important but more standard 
and much weaker condition, Eigenvalue Condition, is proposed. We 
can prove that the probability of lasso selecting wrong variables can 
decays at an exponential rate in ultra-high dimensional settings with¬ 
out other restrains except Eigenvalue Condition. Since penalized least 
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squares have similar framework of solution. This technical tool can be 
extended to other methods which have similar structure. In the dif¬ 
ferent dimensional settings, we show the different performance of lasso 
under different assumptions of noise terms. Results from simulations 
are carried out to demonstrate our results. 

Keywords: Lasso; Model Selection; Eigenvalue Condition. 

1 Introduction 

R. Tibshirani P3 proposed the lasso (least absolute shrinkage and selec¬ 
tion operator) method for simultaneous model selection and estimation of the 
regression parameters. It is very popular for high-dimensional estimation prob¬ 
lems since its statistical accuracy for prediction and variable selection coupled 
with its computational feasibility. Sparse models are very common in high 
dimensional settings where the dimensionality could be larger than the sample 
size. On the other hand, see [15, [16], under some sufficient condition the lasso 
solution is unique, and the number of nonzero elements of lasso solution always 
smaller than n (when n < p). In recent years, this kind of data is also become 
more and more common in most fields. Similar properties can also be seen in 
other penalized least squares since they have similar frame work of solution. 
In consequence, these type of methods are suitable for sparse data. 

Consider the problem of model selection and estimation in the sparse linear 
regression model 

Dn T £nt (l) 

where e n = (ei )T1 , e 2 , n , • ••, £ n ,n)' is a vector of i.i.d. random variables with mean 0 
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and variance a 2 . X n is the nxp design matrix of predictor variables. p n G IRL is 
a vector of true regression coefficients and is commonly imposed to be sparse 
with only a small proportion of nonzeros. Without loss of generality, write 
Pn = {Pi,n,-,Pq,n,Pq+i,n, P P ,n)' where P^ n p 0 for j = 1, ...,q and P jtH = 0 
for j = q+l,...,p. Then write Pn' 1 = (Pi, n ,P q , n )' and P^ = (P q+ i, n , Pp, n ), 
that is, only the first q entries are nonvanishing. Let S n = {j G {1, 2,..., p} : 
Pj, n ~P~ 0}’ fh en the true model is denoted by S n = {1,..., q}. S% = {q + 1, ...,p} 
represent the set of noise variables. 

The lasso estimator is defined as 

Pn(K) e argmin{^-\\Y n - X n p\\\ + A n ||/3||}, (2) 

/Sen? 2 

where X n is the tuning parameter which controls the amount of regularization. 
When \ n = 0, the lasso problem is reversed to Ordinary Least Squares. When 
X n increase, p n become sparse, that is, large X n will shrink p n to 0. Set 
S n = {j G {1, 2, ...,p} : Pj, n 7 ^ 0}, where S n select predictors by lasso estimator 
P n . Consequently, S n and p n both depend on A n , and the model selection 
criteria means the correct recovery of the set S n of (3 n . That is, 

P(S n = S n ) —> 1, as n —» oo. (3) 

On the model selection front of the lasso estimator, [19j established the 
Irrepresentablc condition which on the generating covariance matrices for the 
lasso’s model selection consistency. This condition was also discovered by 
H21 HZl I2D] . Using the language of [H], Irrepresentable condition is defined as 
\C 21 C 11 sign(/3(i))\ ^ 1 — 77 , where sign(-) maps positive entry to 1, negative 
entry to —1 and zero to zero. The definitions of C 21 and C\ 1 can be seen in 
the Section 2. When signs of the true coefficients are unknown, they need l± 
norms of the regression coefficients to be smaller than 1. 
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In other ways, (T2] showed that Lasso is consistent in estimating the de¬ 
pendency between Gaussian variables under a set of conditions. [0] showed es¬ 
timation consistency for lasso in the classical fixed p setting. [15] investigated 
issues concerning the uniqueness of lasso solutions. ms proposed covariance 
test statistic for the lasso which based on lasso fitted value. 

Beyond lasso, regularization methods also have been widely used for high¬ 
dimensional model selection, e.g. see [H121 El CUm 113 ESI I2H122] ■ Ed studied 
the oracle properties and argued that a good procedure should have these 
oracle propertied- m established a nonasymptotic weak oracle property and 
investigated the penalized least squares estimator with folded-concave penalty 
functions. 

There has been a considerable amount of recent work dedicated to the lasso 
problem and regularization methods problem. Yet, the required deterministic 
amount of regularization that gives consistent selection is very strong. Con¬ 
sequently, in this paper, we consider a new way to get the model selection 
consistency of lasso. We investigate a new condition, Eigenvalue Condition, 
which is more standard and easier to meet. Under this condition, in our proof, 
the probability for lasso to select true model is covered by the probability of 

A n 

A, = {||W„|U<^=}, (4) 

v n 

where W n = X' n e n /yfn. A n is simple, intuitive and also easy to calculate its 
probability. Based on the train of thought of the proof, we don’t need other 
constrains on the predictors. 

1 Oracle property can correctly identify the set of nonzero components of f3 n with prob¬ 
ability tending to 1, and at the same time, estimate the nonzero components accurately 

00 . 
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In the simulation part, we discuss the effectiveness of Eigenvalue Condition 
for lasso and compared it with Irrepresentable Condition. Four samples are 
given in the section, which Irrepresentable Condition fails for all settings, but 
two of them can select variables correctly since Eigenvalue Condition holds. 

Besides that, we discuss about the influence of the different assumptions 
of noise terms €i >n for the model selection consistency. Gaussian errors or the 
subgaussian errorjf would be standard, but strong tail. One basic assumption 
is that, errors are assumed to be identically and independently distributed with 
zero mean and finite variance. Bernstein’s inequality, Hoeffding’s inequality 
and similar properties are used for such noise terms in this paper to cover the 
decay rate of A n . 

The rest of the paper is organized as follow. First, in Section 2, we investi¬ 
gate the Eigenvalue Condition and introduce a lower bound to cover the lasso 
choosing wrong models when Eigenvalue Condition holds. Then, to demon¬ 
strate the advantages of this bound, we show in Section 3 with different settings 
and different assumptions of noise terms, and show that lasso has model se¬ 
lection consistency in these situations with mild condition. Section 4 presents 
the results of the simulation studies. Finally, in Appendix A, we present the 
proofs of the main theorem. 

2 Eigenvalue Condition 

For deriving the theoretical results, we write X n {l) and X n (2) as the first 
q and the last p — q columns of X n respectively since the first q columns of /3 n 
2 e.g. -P(|e, ; , n | > t) ^ Ce~ ct2 , \/t ^ 0 
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are assumed nonzero. Let C n = —X'X n . Partition C n as 

n 

C. = ( ) , (5) 

\ @ 21,11 C*22 ,n ) 

where Cn, n is q x q matrix and assumed to be invertible. Set W n = X' n e n /y/n. 
Similarly, and Wn' 1 indicate the first q and last p — q elements of W n . 

Suppose A min (C'n jn ) > 0 denotes the smallest eigenvalue of Cn,n- Now we 
define the following Eigenvalue Condition. 

Eigenvalue Condition. For Vi = 1, j — q + 1, ...,p, there exists a 
constant 5 G (0,1) 

\x[ tn X jin < (1 - 5)A min (C n , n ). (6) 

Remark 1. This condition is the only condition for the model selection con¬ 
sistency of lasso, and also is the crucial one. It can be roughly thought of seen 
as a regularization constraints of the maximum and the minimum eigenvalue 
of C u , n an d C 12 , n respectively, which both are typical assumptions in sparse 
linear regression literature, see for example 0 0 G3 12f. Consequently, we 
call it Eigenvalue condition. 


Remark 2. Similar condition can be seen in m, called Irrepresentable con- 


ditior 


, which is defined as 


|C' 2 iC' 11 1 s^n(/3 ( i ) )| ^ 1 -rj, 


(7) 


where sign(-) maps positive entry to 1, negative entry to —1 and zero to zero. 
Compared with that, Eigenvalue Condition is weaker for the constraint on the 
design matrix, and above all, we don’t need the information of the true coeffi¬ 
cients which are always unknown. 

3 or neighborhood stability condition [12], similar restraint can also be seen in [20 . 
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Then, when Eigenvalue condition holds, following Theorem describes the 
relationship between the probability of lasso choosing the true model and the 


probability of (HWnHoo ^ n v }. Videlicet, A n is a lower bound on the proba¬ 
bility of lasso picking the true model. 

Theorem 1. Assume Eigenvalue Condition holds, then 




Remark 3. Theorem [7] is a key technical tool in this paper. It puts a lower 
bound on the probability of Lasso selecting the true model, and this bound is 
visual and easy to calculate. The regularization parameter X n trades of the size 
of this event. 

Remark 4. Besides that, Theorem]]] can be applied in a wide range of dimen¬ 
sional setting. There’s no restraint about p and q. That is, we can discuss the 
behavior of lasso on model selection consistency under different dimensional 
setting in the following. 

3 Model selection consistency 

Now we consider about the decay rate of the probability of -F > (||W ri ,|| 00 > 

—t=) with means the upper bound of P(S n S n ). We discuss different type 

yn 

of dimensional settings and different assumption of noise terms in this section 
Two types of dimensional setting are considered. First of all, we choose 
general high dimensional setting, i.e. p = 0(n c ), where b < c. If c < 1, it is 
equivalent to low dimensional setting but p tends to infinity. 

Under this setting, we can obtain the model selection consistency of lasso 
by no constraint for the noise terms. 
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Then, we consider a more complicated situation, ultra-high dimensional 
setting, i.e. p = 0(e nC ) where 0 < c < 1. Under this setting, we need more 
assumption of e n , to make -P(||Wn||oo ^ Krf ) —* 1. Gaussian assumption 
would be a simple and common one, but strong tail. 

Before discussing the detail rate of the probability of lower bound. We give 
the follow regular condition 

(C.l) n~ l X' j n X j)U ^ 1/cr 2 for j = 1, 

(C.l) is a typical assumption in sparse linear regression literature, since it can 
be achieved by normalizing the covariates, see for example PES]. 

3.1 General high-dimensional setting p = 0{n c ) 

In this part, we consider the general high-dimensional setting (p n) and 
show the model selection consistency of lasso as follow 

Theorem 2. Assume e t are i.i.d random variables with mean 0 and variance 
a 2 . If p = 0(n c ), q = 0(n b ) where b ^ c, assume (C.l) holds and let —^= ^ 

V n 

nV 2 , r] > c then if Eigenvalue Condition holds, we have 

P(S n = S n ) ^ 1 — n -r?+c —> 1, as n —> oo. (9) 

Proof. Follow the result in Theorem |T] we have 

P(S n = S n )^P(\\W n \\ OB ^^=). (10) 

v n 

Consequently, considering about the probability of {HWnHoo > —^=}. If 


(C.l) holds, by Markov’s inequality, we can easily get that 

/’(liwyu > 

V n “ \/n 

j=i 

= £ P(l^,„£/v^| > 

i=i v 

^ (A n/y/n) 2 • nC ^ n~ v+c —* 0, as n ^ oo. (11) 

□ 

The proof of Theorem [2] is simple and intuitive. It states that if Eigenvalue 
Condition holds, lasso can select the true model with nearly no precondition in 
the sparse high-dimensional setting. Similarly, if we work under the classical 
setting where p, q and /3 are fixed when n —y oo. Then we have 

Corollary 1. For fixed p, q and (3, under regularity assumption (C.l) and as¬ 
sume €i are i.i.d random variables with mean 0 and variance a 2 . If Eigenvalue 

condition holds and -fiL —y oo, 
yjn 

P(S n = S n ) —» 1, as n —>■ oo. (12) 

Since Corollary [I] can be proved by Markov’s inequality too, the proof is 
omitted here. 

On the other hand, considering the Gaussian assumption of noise terms, 
if €i are i.i.d Gaussian random variables, and the dimensional setting is that 
p = 0(n c ), q = 0{n b ) where b ^ c, then follow the setting of Theorem [21 we 
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have 


P(S n ^S n )^P(\\W n \\ 00 >^p=) 

\Jn 

^ X>(HU„| > % 

j =1 v 

< n c ^ v ^e~^ nri = o(e~^ nV ) —> 0, as n —> oo, (13) 

where the last equality holds since the Gaussian distribution’s tail probability 
bound, P(|ej| ^ t) < t _1 e _ ^ 2 ,Vt ^ 0. It can be relaxed to subgaussian 
assumption, i.e. P(|ei| ^ t) ^ Ce~ ct2 , Vi ^ 0. 

3.2 Ultra-high dimensional setting p = 0(e nC ) 

In this part, we consider the ultra-high dimensional setting as p = 0(e nC ) 
where c > 0. Furthermore, we discuss the different situation under different 
assumption of noise terms (Gaussian assumption and non-Gaussian assump¬ 
tion). The key technical tool used into the non-Gaussian case is Bernstein’s 
inequality and related theory. 

We shall make use of the following condition: 

(C.2) Assume e ijU , ...,e nj7l be independent random variables with mean 0, and 
follow inequality satisfying for j = 1, ...,p 

1 Too I 

-E\W j>n \ m < T L m - 2 ,m = 2,3,... (14) 

where Wj jU = —=X'^ n e n and L £ (0, oo). 

(C.2) is used for Bernstein’s inequality. Then we have the follow result for 
ultra-high dimensional setting for non-Gaussian assumption. 
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Theorem 3. Assume e t are i.i.d random variables with mean 0 and variance 
a 2 , if (C.2) holds, and p = 0(e nC ), q = 0(n b ) where 1 ^ b, c > 0. Then if 
Eigenvalue Condition holds and let —> Kn(l + 1), K > 0, we have for all 


t > 0, 


P(S n = S n ) ^ 1 — e nt —> 1 , as n —> oo. 


Proof. By Bernstein’s inequality, when (C.2) holds, we have 

P(max |— Wj tn \ ^ Lt + t + a(L,n,p) ^ exp[—nt\. 
1 ^j<:P n 

Follow the setting of p, when n is large enough, 

a{L,n,p) — \/ (2log2 p)/n + (L\og(2p))/n 0. 


(15) 


(16) 


(17) 


Let K e (0, cx)) to make follow inequalities hold for all t > 0, 

a(L,n,p) < K, Lt + \[21 ^ Kt. (18) 

Then we have 


P(\\W n \\ 00 >Kn(l + t))^e~ nt , 


completing the proof. 


(19) 


□ 


A r 


Similarly as in general high-dimensional setting, assume —= ^ n v where 

In 


2r) > c. If (C.l) holds, Gaussian assumption of noise terms are considering in 
the follow 

P(S n ^S n ) ^P(||W n || 0O > ^) 

In 


3 =1 


J,n 


> 


n 


= 0(-n~ v e nC ~^ n2ri ) = o(e~ n ") -► 0, as n -y oo. (20) 
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4 Simulation part 


In this section ,we evaluate the finite sample property of lasso estimator 
with synthetic data. Staring with the behavior of lasso under different settings, 
then considering the relationship between n, p, n, and the different noise terms. 

4.1 Eigenvalue Condition 

This first part illustrates two simple cases (low dimension vs high dimen¬ 
sion) to show the efficiency of Eigenvalue Condition for lasso. Each case de¬ 
scribes two different setting to lead the lasso’s model selection consistency 
and inconsistency when Eigenvalue Condition holds and fails. As a contrast, 
we introduce Irrepresentable condition in this part, and it is fails in all the 
settings. 

Example. 1 

In the low dimensional case, assume there are n = 100 observations and 
the number of parameters is chosen as p = 3, q = 2, that is 


P= {2,3,0}. 


( 21 ) 


We generate the response y by 


( 22 ) 


y — Xifii + X 2 P 2 + X 3 P 3 + e, 


where Xi, X 2 and e are i.i.d random variables from Gaussian distribution with 
mean 0 and variance 1. The third predictor X s is generated to be correlated 
with other parameters as follow two cases 


2 


2 


1 


( 23 ) 
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and 


X 3 = X, + ix 2 + -Le, (24) 

where e is i.i.d random variable with the same setting as e. 

X 3 is also constructed from Gaussian distribution with mean 0 and variance 
1. And we can find that Eigenvalue Condition fails for the first case and holds 
for the second one. Irrepresentable condition is fails for both two situation. 

Example. 2 

We construct a high dimensional case which p = 400, q = 4 and n = 100. 
The true parameters is set as 

P — {2, 3,1,4, 0, 0,..., 0}, (25) 

and the response y is generated by 

y = X/3 + e, (26) 

where X = (Ad,..., X p ) is 100 x400 matrix, and the elements of X are i.i.d ran¬ 
dom variables from Gaussian distribution with mean 0 and variance 1 except 
A 40 o. The last predictor A 40 o is generated respectively as follow two settings 

-Woo = -Ai + -X 2 + -X 3 + -A 4 + -A 5 + -A 6 + -X 7 + -e, (27) 

and 

-^400 = -jAi + -X 2 + -X 3 + -A 4 + -X 5 + -X 6 + -X 7 + -e, (28) 

where e follow the same setting as Example. 1. Hence A 4 oo is also constructed 
from Gaussian distribution with mean 0 and variance 1. We can End that 
Eigenvalue Condition fails for the first high dimensional case and holds for the 
second. Besides that, Irrepresentable condition is fails for both two situation. 
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We get different lasso solutions for above four cases( The lasso path is 
got by lars algorithm [4].). See Figured] obviously, two left examples do not 
satisfy Eigenvalue Condition and hence cannot select variables correctly(both 
graphs select other irrelevant variables, e.g. X 4 in the first one and X 400 in the 
second). In contrast, both two right examples select the true variables when 
Eigenvalue Condition holds. 


LASSO 

2 


LASSO 



2 3 



|beta|/max|beta| 


|beta|/max|beta| 


(a) EC fails 


LASSO 

3 


4 9 31 69 121 



0.0 


(b) EC holds 


LASSO 

3 4 29 


0.2 


0.4 0.6 

|beta|/max|beta| 



(c) EC fails (d) EC holds 

Figure 1: An example to illustrate the efficiency of EC on Lasso’s 
(in)consistency in model selection. 
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4.2 Relationship between p, q and n 


In this part, we give a simple and direct view to show the relationship 
between n, p and q, or to say, the sparsity and the sample size. 

The nonzero elements (3^ is set as 

{9, 6, 8,12,19, 8,19, 9, 6, 8,12,19, 8,19}. (29) 

If the number of nonzero elements is less than 14, we select the number 
in sequence. The rest of the other elements in this gather are shrunk to zero. 
The number of observations and the parameters are chosen as Table [1] And 
the predictors are made from Gaussian random generation. Among this table, 
lasso select the right variables in the first six items in the list and wrong in 
the rest. 

We all consider high dimensional settings here. And the results indicate 
that, when q increase, the sample size needs to be increase too. As a contrast, 
the number of zero elements has less influence on the lasso’s (in)consistency in 
model selection. 


Table 1: Example settings. 


Example 

n 

P 

q 

Example 

n 

P 

q 

1 

100 

400 

4 

7 

100 

400 

5 

2 

100 

500 

5 

8 

100 

500 

6 

3 

200 

500 

7 

9 

100 

500 

7 

4 

200 

1000 

7 

10 

100 

1000 

7 

5 

500 

500 

14 

11 

100 

2000 

7 

6 

500 

2000 

14 

12 

300 

2000 

14 
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4.3 Different noise terms 


In the last part, we consider high dimensional example with different noise 
terms assumption. 

Data from the high-dimensional linear regression model is set as 

Hi — X'ifi + (30) 

where the data had n = 100 observations and the number of parameters was 
chosen as p = 1000. The true regression coefficient vector is fixed as 

@ — {9, 6, 8, 0,..., 0}, (31) 

For the distribution of the noise, e, we consider four distributions: Gaussian 
assumption with mean 0 and variance 1; Exponential distribution with rate 
1; Uniform distribution with minimum 0 and maximum 1; Student’s t with 
degrees of freedom 100. 

The result are depicted in Figures |2j It reflects that, in the standard data 
and strong sparsity situation, lasso always choose the right model no matter 
the distribution of noise terms. 


Appendix 

Proof of Theorem 1. Review the lasso estimator 

/3 n (A„) e argmin{h\Y n - X n f}\\l + \n\\P\\}. (32) 

/3eRp 4 

Let u n = y/n0 n - /3 n ) and 

Fn(fin) — 2 II Vn ~ -^n^n||2 + A ra ||/3 n ||. (33) 
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(a) Gaussian assumption 


(b) Exponential distribution 


LASSO 

0 2 3 21 57 112 



LASSO 

0 2 3 7 21 36 45 63 92 121 



0.0 0.2 0.4 0.6 0.8 1.0 

|beta|/max|beta| 


(c) Uniform distribution (d) Student’s t 

Figure 2: An example to illustrate the lasso’s behavior in the high dimensional 
setting with different assumption of noise terms 

Define V n {u n ) = F n 0 n ) - F n 0 n ), C n = -X' n X n and W n = ~^X'e n , 

n a Jn 

V n (u n ) can be written as 

v n (u n ) = i v! n C n u n - u' n W n + X n ^ WPn + ^j=\\ - ||Ai||^ • (34) 

Let fcn \ /30 and W0, Wh 2 ^ as the first q and last p — q elements of j3 n 
and W n respectively. Similar, ul? and ul? denote the first q and last p — q 
elements of u n . 
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Conditioned on A n = {llWJloo 4 —?=} , setting 0 < <5 < 1, we have, 

V n 

v„(u n ) > [fiWcVfiW + *f>C 22 , n if] - ||W«|U ■ ||fiW|j 


A 


J^ 1)|l + l ^ >l [Jk |w?>1 


j=q +1 


A, 


( 35 ) 


i/o’i /oX 

Since Un C 22 ,nUn ^ 0 and A m in(Cii, n ) denotes the smallest eigenvalue of 
Cn )Tl , we have 

1 — <5 


14(^4) ^ | |A 


Ar 


(!) I 
n I 




m 


(i) 


n 


E 

i=9+i 

1-5 


m 


'J,n I 


Ar 


n 


-I1C, 


.7,™! 


S 5 ll“!, 1) || I —S—- HWnlloo - 


A, 


j=q +1 


Ar 


54-1 (4 iiw 


(36) 


A 


Since ||H4||oo 4 —^=, for simplicity of presentation, when n is large enough, 


n 


under A n , we consider \/n| \W n \ |oo/A n = o(ljj which means it can be omitted 

when compared with —E. 

Jn 


Dehne 


M, = Y~j A m'n(Cll,n) A 


77 , 


(37) 


Hence V n (u n ) > 0 depends on 


t# } || > Af n }. 


(38) 


Since 14(0) = 0, the minimum of V n (u n ) cannot be attained at ||'Un' ) || > 


M n . Then assume {u n G R p : ||wn^| 4 M n , iin’ 4 0} and Eigenvalue 

Condition holds, following inequality holds uniformly. 


',( 2 ) 


4 0r to say, A n can be refined to A* = {IIWJloo < K n — 4} where K n = o(l), e.g. 

\Jn 

K n = (vdoglogn) -1 . 
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- v n (u^\ 0) = i(i* l 1| )'Ci2,„fl*, 2) + i(«i 2, )'C22,„ii, 2) 

+ %fi< 2) H - (*'W 

'n 


> E 

j=(?+i 

> 0. 




• 7,^1 




( 39 ) 


where the last inequality holds since Eigenvalue Condition holds. Then the 
minimum of V n (u n ) can not be attained at Un ^ ^ 0 too, hence we have the 
follow result, 

argminV n (u n ) e { u n e R p : ||u^|| ^ M n ,u^ = 0} , (40) 

•UnSRP 

completing the proof. 

□ 
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